Exploratory Data Analysis: Offensive Efficiency in the Modern NBA
The modern NBA has steadily evolved toward higher offensive efficiency, with a clear acceleration in the last decade. Using league-average annual data from 1980–2025, I track Offensive Rating (ORtg), Pace, and 3-Point Attempt Rate (3PAr), then layer in attendance to capture COVID’s shock, and finally use weekly sports-betting stocks as a compact example of seasonal decomposition.
Augmented Dickey-Fuller Test
data: ts_ortg
Dickey-Fuller = -1.0264, Lag order = 3, p-value = 0.9233
alternative hypothesis: stationary
Code
p1 <-ggplot(df_ortg_decomp, aes(x = Year)) +geom_line(aes(y = Value, color ="Original"), size =1) +geom_line(aes(y = Trend, color ="Trend"), size =1.2) +scale_color_manual(values =c("Original"="#006bb6", "Trend"="#f58426")) +labs(title ="ORtg: Original Series vs. Trend (Additive Decomposition)", y ="Offensive Rating") +theme_minimal() +theme(legend.title =element_blank())p2 <-ggplot(df_ortg_decomp, aes(x = Year, y = Irregular)) +geom_hline(yintercept =0, linetype ="dashed", color ="gray50") +geom_line(color ="#000000", size =0.8) +geom_point(color ="#000000", size =2) +labs(title ="ORtg: Irregular Component (Additive Residuals)", y ="Residual (points)") +theme_minimal()p1 / p2
Code
par(mfrow =c(2, 1))plot(ts_ortg, main ="Original ORtg Series", ylab ="ORtg", xlab ="Year")plot(diff_ortg, main ="First Differenced ORtg Series", ylab ="Change in ORtg", xlab ="Year")
Code
acf_diff_ortg <-ggAcf(diff_ortg, lag.max =20) +labs(title ="ACF of First Differenced ORtg") +theme_minimal()pacf_diff_ortg <-ggPacf(diff_ortg, lag.max =20) +labs(title ="PACF of First Differenced ORtg") +theme_minimal()acf_diff_ortg / pacf_diff_ortg
Code
print(adf_diff_ortg)
Augmented Dickey-Fuller Test
data: diff_ortg
Dickey-Fuller = -3.174, Lag order = 3, p-value = 0.109
alternative hypothesis: stationary
ORtg, points per 100 possessions, is the primary outcome. The long-run trend is unambiguously upward but non-linear. A slow climb through the 1980s–2000s, then a pronounced step-up beginning around 2012, and continued gains into the post-COVID years. Autocorrelation patterns (slow ACF decay and PACF spike at lag 1) and an ADF test confirm ORtg is non-stationary in levels but becomes stationary after first-differencing; variance is roughly constant, so an additive structure fits. A simple LOESS trend explains nearly all variation, with small residuals. This implies that the story is primarily about a structural trend rather than short-cycle oscillations.
ggplot(df_pace, aes(x = Year, y = Value, color = Era)) +geom_line(size =1.2) +geom_point(size =3) +geom_vline(xintercept =2012, linetype ="dashed", color ="#f58426", size =1) +scale_color_manual(values =c("Pre-Analytics Era"="#006bb6","Analytics Era"="#f58426","Post-COVID Era"="#bec0c2" )) +labs(title ="NBA Pace (1980-2025): Possessions Per 48 Minutes",x ="Season",y ="Pace (Possessions per 48 min)",color ="Era" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
Code
gglagplot(ts_pace, do.lines =FALSE, lags =9) +ggtitle("Lag Plot of Pace") +theme_minimal()
Code
acf_pace <-ggAcf(ts_pace, lag.max =20) +labs(title ="ACF of Pace") +theme_minimal()pacf_pace <-ggPacf(ts_pace, lag.max =20) +labs(title ="PACF of Pace") +theme_minimal()acf_pace / pacf_pace
Code
print(adf_pace)
Augmented Dickey-Fuller Test
data: ts_pace
Dickey-Fuller = -1.4007, Lag order = 3, p-value = 0.8116
alternative hypothesis: stationary
Code
par(mfrow =c(2, 1))plot(ts_pace, main ="Original Pace Series", ylab ="Pace", xlab ="Year")plot(diff_pace, main ="First Differenced Pace Series", ylab ="Change in Pace", xlab ="Year")
Code
acf_diff_pace <-ggAcf(diff_pace, lag.max =20) +labs(title ="ACF of First Differenced Pace") +theme_minimal()pacf_diff_pace <-ggPacf(diff_pace, lag.max =20) +labs(title ="PACF of First Differenced Pace") +theme_minimal()acf_diff_pace / pacf_diff_pace
Code
print(adf_diff_pace)
Augmented Dickey-Fuller Test
data: diff_pace
Dickey-Fuller = -2.9769, Lag order = 3, p-value = 0.187
alternative hypothesis: stationary
Code
autoplot(ts_pace, series ="Original") +autolayer(ma_pace_3, series ="MA(3)") +autolayer(ma_pace_5, series ="MA(5)") +autolayer(ma_pace_10, series ="MA(10)") +scale_color_manual(values =c("Original"="gray60","MA(3)"="#006bb6","MA(5)"="#f58426","MA(10)"="#000000" ),breaks =c("Original", "MA(3)", "MA(5)", "MA(10)") ) +labs(title ="Pace: Moving Average Smoothing Comparison",subtitle ="U-shaped trajectory becomes clearer with increased smoothing",y ="Pace (possessions per 48 min)",x ="Season",color ="Series" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
Pace, the mediator in this story, follows a different trajectory: a classic U-shape. Possessions per 48 minutes decline from fast 1980s basketball to a trough in the mid-2000s, then recover through the 2010s and 2020s. Importantly, the Pace recovery begins before the analytics inflection, suggesting it is not simply a byproduct of analytics. Like ORtg, Pace is non-stationary in levels and stationary in first differences; moving-average smoothers with 5–10 year windows make the U-shape especially clear. This is rather significant as this means efficiency gains do not reduce to “more possessions”.
acf_3par <-ggAcf(ts_3par, lag.max =20) +labs(title ="ACF of 3PAr") +theme_minimal()pacf_3par <-ggPacf(ts_3par, lag.max =20) +labs(title ="PACF of 3PAr") +theme_minimal()acf_3par / pacf_3par
Code
print(adf_3par)
Augmented Dickey-Fuller Test
data: ts_3par
Dickey-Fuller = -1.3536, Lag order = 3, p-value = 0.8303
alternative hypothesis: stationary
Code
par(mfrow =c(2, 1))plot(ts_3par, main ="Original 3PAr Series", ylab ="3PAr", xlab ="Year")plot(diff_3par, main ="First Differenced 3PAr Series", ylab ="Change in 3PAr", xlab ="Year")
Code
acf_diff_3par <-ggAcf(diff_3par, lag.max =20) +labs(title ="ACF of First Differenced 3PAr") +theme_minimal()pacf_diff_3par <-ggPacf(diff_3par, lag.max =20) +labs(title ="PACF of First Differenced 3PAr") +theme_minimal()acf_diff_3par / pacf_diff_3par
Code
print(adf_diff_3par)
Augmented Dickey-Fuller Test
data: diff_3par
Dickey-Fuller = -3.5956, Lag order = 3, p-value = 0.04462
alternative hypothesis: stationary
Code
autoplot(ts_3par, series ="Original") +autolayer(ma_3par_3, series ="MA(3)") +autolayer(ma_3par_5, series ="MA(5)") +autolayer(ma_3par_10, series ="MA(10)") +scale_color_manual(values =c("Original"="gray60","MA(3)"="#006bb6","MA(5)"="#f58426","MA(10)"="#000000" ),breaks =c("Original", "MA(3)", "MA(5)", "MA(10)") ) +scale_y_continuous(labels = scales::percent_format(accuracy =1)) +labs(title ="3-Point Attempt Rate: Moving Average Smoothing Comparison",subtitle ="Analytics revolution's exponential growth pattern clearly visible",y ="3-Point Attempt Rate (3PAr)",x ="Season",color ="Series" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
The strongest structural break appears in 3PAr, which measures the share of shots taken from three. 3PAr rises modestly for decades and then accelerates sharply around 2012, around the same period where ORtg takes off. Lag plots show strong positive relationships across lags, and ACF/PACF behavior again indicates a trending series (non-stationary levels; stationary first differences). Smoothing highlights two regimes: a gradual era up to around 2012 and a rapid, near-exponential climb thereafter. This timing alignment supports the hypothesis that shot selection modernization (spacing, threes above the break, rim attempts enabled by space) is tightly coupled to league-wide efficiency gains
ggplot(df_attendance, aes(x = Year, y = Value, color = Era)) +geom_line(size =1.2) +geom_point(size =3) +geom_vline(xintercept =2020, linetype ="dashed", color ="red", size =1) +annotate("text",x =2020, y =24, label ="COVID-19\nPandemic (2020)",hjust =-0.05, color ="red", fontface ="bold", size =3.5 ) +annotate("rect",xmin =2020, xmax =2021, ymin =0, ymax =25,alpha =0.1, fill ="red" ) +scale_color_manual(values =c("Pre-COVID"="#006bb6","COVID Era"="#d62728","Post-COVID Recovery"="#2ca02c" )) +labs(title ="NBA Total Attendance (1990-2025): COVID-19 Disruption and Recovery",subtitle ="90% collapse in 2020-21 followed by gradual recovery",x ="Season",y ="Total Attendance (Millions)",color ="Era" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
Code
gglagplot(ts_attendance, do.lines =FALSE, lags =9) +ggtitle("Lag Plot of Total Attendance") +theme_minimal()
Code
acf_attendance <-ggAcf(ts_attendance, lag.max =15) +labs(title ="ACF of Total Attendance") +theme_minimal()pacf_attendance <-ggPacf(ts_attendance, lag.max =15) +labs(title ="PACF of Total Attendance") +theme_minimal()acf_attendance / pacf_attendance
Code
autoplot(ts_attendance, series ="Original") +autolayer(ma_attendance_3, series ="MA(3)") +autolayer(ma_attendance_5, series ="MA(5)") +scale_color_manual(values =c("Original"="gray60","MA(3)"="#006bb6","MA(5)"="#f58426" ),breaks =c("Original", "MA(3)", "MA(5)") ) +labs(title ="Attendance: Moving Average Smoothing (COVID Shock Visible)",subtitle ="Smoothing cannot remove the dramatic 2020-21 disruption",y ="Total Attendance (millions)",x ="Season",color ="Series" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
Attendance provides the counterpoint: a stable pre-COVID plateau around ~21–22 million through 2019, a 2020–21 collapse during the bubble/limited-capacity seasons, and a partial recovery that remains below the pre-pandemic ceiling. The sharp, short-window discontinuity is a uncounted for shock rather than a new equilibrium. Even with 3–5 year moving averages, the COVID impact is too large to smooth away
Sports Betting Stocks
The NBA’s analytics revolution had profound effects beyond the court. As basketball became more quantifiable and predictable, it enabled a parallel transformation in sports betting. Companies like DraftKings and Penn Entertainment built businesses on the data infrastructure that analytics created. We briefly examine weekly stock prices for major betting operators to illustrate how this financial dimension connects to the on-court changes documented above.
autoplot(ts_penn, series ="Original") +autolayer(ma_penn_4, series ="MA(4 weeks)") +autolayer(ma_penn_13, series ="MA(13 weeks)") +autolayer(ma_penn_52, series ="MA(52 weeks)") +scale_color_manual(values =c("Original"="gray60","MA(4 weeks)"="#006bb6","MA(13 weeks)"="#f58426","MA(52 weeks)"="#000000" ),breaks =c("Original", "MA(4 weeks)", "MA(13 weeks)", "MA(52 weeks)") ) +labs(title ="PENN Stock: Moving Average Smoothing Comparison",subtitle ="Even annual smoothing cannot hide the structural collapse",y ="Stock Price ($)", x ="Year", color ="Series" ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(size =11, color ="gray40"),legend.position ="bottom" )
Code
acf_penn <-ggAcf(ts_penn, lag.max =52) +labs(title ="ACF of PENN Weekly Stock Price") +theme_minimal()pacf_penn <-ggPacf(ts_penn, lag.max =52) +labs(title ="PACF of PENN Weekly Stock Price") +theme_minimal()acf_penn / pacf_penn
Because annual NBA series are effectively non-seasonal, I include weekly sports-betting equities to demonstrate seasonality and multiplicative decomposition. DraftKings (DKNG), Penn (PENN), MGM, and Caesars (CZR) all show pandemic-era boom-bust dynamics on weekly data. Prices are non-stationary in levels, stationary in differences, and has volatility that scales with price; implying a multiplicative model is necessary for decomposition. DKNG exhibits a large run-up, correction, and stabilization while PENN shows a sharper hype-driven spike and deeper collapse.
Pulling the findings together: ORtg, Pace, 3PAr, and Attendance are all non-stationary in levels and become stationary after first differences (d = 1). Therefore, additive decomposition is appropriate for the NBA metrics , while multiplicative decomposition fits the weekly equities. Short and medium moving-average windows clarify regime shifts: the 2012 analytics inflection in ORtg/3PAr, the mid-2000s trough and rebound in Pace, and the COVID intervention in Attendance.